24 research outputs found

    Segmental and prosodic improvements to speech generation

    Get PDF

    System demonstration goalgetter: generation of spoken soccer reports

    Get PDF
    In this paper we describe a demonstration of the GoalGetter system, which generates spoken soc- cer reports (in Dutch) on the basis of tabular data. Two types of speech output are available. The demo runs via the web. It includes the possibility of !creating your own match' and having GoaiGetter generate a report on this match. . ,_ 1. About the system The GoalGetter system is a Data-to-Speech system which generates spoken soccer reports (in Dutch) on the basis of tabular data. The system takes as input data about a soccer match that are derived from a Teletext page. 1 The output of the system is a spoken, natural language report conveying the main events of the match 'described on the Teletext page

    High-Quality Speech Output Generation through Advanced Phrase Concatenation

    No full text
    This paper describes a method for generating natural sounding speech, called phrase concatenation, which is used in a telephone inquiry system that provides train timetable information. The concatenation technique used combines pre-recorded words and phrases, but is new in that it involves the recording of several prosodically different versions of otherwise identical phrases. Although no formal evaluation has taken place yet, we feel confident in saying that the output meets high quality standards and approaches the quality of natural speech. 1. INTRODUCTION During the last decade, the performance of spoken dialogue systems has improved substantially. At the moment it is possible to support a number of simple practical tasks in limited domains. As a result, many telephone -based information systems are being developed in different countries. The practical goal of the NWOTST Priority Programme is to build a prototype of a Dutch train timetable information system. The system is call..

    On the performance of speech output in a practical setting

    No full text
    In spoken dialogue systems, in which humans interact with computers over the telephone, it is essential that the voice output of the system be of high quality. Both the intelligibility and the naturalness of the output should be sufficiently high. There are several techniques for providing a system with speech output, each with its own advantages and disadvantages. This paper discusses a formal evaluation experiment of three speech output techniques. Natural speech was included as a reference condition. The speech was rated on intelligibility and fluency of the output. Additionally, the overall quality of the speech and its suitability for use in a commercial application were assessed. The results reveal significant differences between the techniques. Diphone synthesis still has an inferior quality compared to the other techniques, both in terms of intelligibility and fluency. Conventional phrase concatenation is quite intelligible, but scores less on fluency. IPO's phrase concatenation is by far the best technique

    On the performance of speech output in a practical setting

    No full text
    In spoken dialogue systems, in which humans interact with computers over the telephone, it is essential that the voice output of the system be of high quality. Both the intelligibility and the naturalness of the output should be sufficiently high. There are several techniques for providing a system with speech output, each with its own advantages and disadvantages. This paper discusses a formal evaluation experiment of three speech output techniques. Natural speech was included as a reference condition. The speech was rated on intelligibility and fluency of the output. Additionally, the overall quality of the speech and its suitability for use in a commercial application were assessed. The results reveal significant differences between the techniques. Diphone synthesis still has an inferior quality compared to the other techniques, both in terms of intelligibility and fluency. Conventional phrase concatenation is quite intelligible, but scores less on fluency. IPO's phrase concatenation is by far the best technique

    Improving diphone synthesis by adding context-sensitive diphones

    No full text
    One well-known problem with concatenative synthesis is the occurrence of audible discontinuities at concatenation points. Formant jumps across concatenation points suggest the problem is due to spectral differences. In a previous experiment (Klabbers & Yeldhuis, 1998), the results of a listening experiment were correlated with several spectral distance measures to find one that best predicts the audible discontinuities. The Kullback-Leibler distance proved to be the best measure. In this paper we demonstrate its use for clustering diphones with similar contexts. For each of the clusters, a limited number of context-sensitive diphones is added to the database to reduce the number of audible discontinuities. A new listening experiment was performed, which showed that a significant improvement can be obtained

    Improving diphone synthesis by adding context-sensitive diphones

    No full text
    One well-known problem with concatenative synthesis is the occurrence of audible discontinuities at concatenation points. Formant jumps across concatenation points suggest the problem is due to spectral differences. In a previous experiment (Klabbers & Yeldhuis, 1998), the results of a listening experiment were correlated with several spectral distance measures to find one that best predicts the audible discontinuities. The Kullback-Leibler distance proved to be the best measure. In this paper we demonstrate its use for clustering diphones with similar contexts. For each of the clusters, a limited number of context-sensitive diphones is added to the database to reduce the number of audible discontinuities. A new listening experiment was performed, which showed that a significant improvement can be obtained
    corecore